Mini Project 3: Visualizing and Maintaining the Green Canopy of NYC
📚Introduction
Many New Yorkers do not appreciate the trees that benefit them and their environment on a daily basis. Over 1 million trees (specifically 1,093,439 trees) are spread across the Big Apple yet only litter is scattered through most of them. Such people do not consider that these trees are essential for reducing CO2 exposure, provide shelter for birds and squirrels, and provide shade while giving the tree sunlight to grow.
While this project is not meant to start a “stop litter” movement, it analyzes trees and their corresponding district to make a proposal for the NYC Parks Department. Specifically, the goal is to create a new program on why action must be taken in a specific district addressing its trees using visualizations gathered from official NYC data websites.
Setting up code libraries
#Below are the following libraries used for this project.#Obtaining data and performing SQL like commandslibrary(sf)library(tidyverse)library(httr2)#Data injectionlibrary(glue)library(readxl)library(tidycensus)#Display datatableslibrary(DT)#Visualization librarylibrary(ggplot2)library(plotly)library(tidyr)
💽Download NYC City Council District Boundaries
Data was collected from the NYC Department of Planning using the latest release as of making this project, 25C. The shoreline version will be collected as it can display more trees compared to the the water area version.
Downloading the Boundary Data
#The following code was inspired from how we inject data from mp02#Create directory, if it does not exist already, to store dataif(!dir.exists(file.path("data", "mp03"))){dir.create(file.path("data", "mp03"), showWarnings=FALSE, recursive=TRUE)}library <-function(pkg){## Mask base::library() to automatically install packages if needed## Masking is important here so downlit picks up packages and links## to documentation pkg <-as.character(substitute(pkg))options(repos =c(CRAN ="https://cloud.r-project.org"))if(!require(pkg, character.only=TRUE, quietly=TRUE)) install.packages(pkg)stopifnot(require(pkg, character.only=TRUE, quietly=TRUE))}#Define zip file name to indicate whether it will existzip_name <-"nycc_25c.zip"url_path <-"https://s-media.nyc.gov/agencies/dcp/assets/files/zip/data-tools/bytes/city-council/nycc_25c.zip"#Zip file pathzip_path <-"./data/mp03/"#Downloads the required file into the correct directoryif(!file.exists(glue(zip_path, zip_name))){download.file(url = url_path, destfile =paste0(zip_path, "/", zip_name), mode ="wb")}unzipped_pathname <-paste0(zip_path, "nycc_25c/")#Unzip file if necessaryif(!dir.exists(unzipped_pathname)){unzip(paste0(zip_path, "/", zip_name), exdir = zip_path, overwrite =TRUE) #Paste0 to specify pathname of the file}#Read shp file and store it as the data variableDATA <- sf::st_read(paste0(unzipped_pathname, "nycc.shp"))#Transform result into WGS 84DATA <-st_transform(DATA, crs="WGS84")
Raw District Boundary Data Output
#Returning transformed DATA to userdatatable(DATA, style ="bootstrap5", caption ="Raw Data Output")
Explaining the Table
Note: column names were left untouched to show raw data. It may be difficult to understand at first glance.
The datatable may look scary but provides important information later on. Most notably are columns Shape_Leng showing total length of a district in NYC and Shape_Area showing how large the district is. Currently, there are 51 districts to work with.
Data Made Easier
The visualization below makes it much easier to see where trees are being looked at. More specifically, it shows the 5 boroughs of the NYC metropolitan area with a boundary acting as a district.
Show the code
#Visualization of area being worked onggplot() +geom_sf(data = DATA, mapping =aes(geometry = geometry)) +theme_bw()
Show the code
rm(all)
💽Download NYC Tree Points
Since this project focuses on trees, data containing tree location is used as a main metric. The code below downloads the necessary data.
Downloading the Tree Data
#The following code is a modified version of data acquisition from https://michael-weylandt.com/STA9750/archive/AY-2024-SPRING/miniprojects/mini01.htmlif(!file.exists("data/mp03/nyc_tree_locations.csv")){#URL was modified as per instructions ENDPOINT <-"https://data.cityofnewyork.us/resource/hn5i-inap.geojson" BATCH_SIZE <-50000#Edit if we start to see long computations for visuals. Same with offset. OFFSET <-0 END_OF_EXPORT <-FALSE ALL_DATA <-list()while(!END_OF_EXPORT){cat("Requesting items", OFFSET, "to", BATCH_SIZE + OFFSET, "\n") req <-request(ENDPOINT) |>req_url_query(`$limit`= BATCH_SIZE, `$offset`= OFFSET) resp <-req_perform(req) batch_data <-st_read(resp_body_string(resp))# batch_data <- fromJSON(resp_body_string(resp)) ALL_DATA <-c(ALL_DATA, list(batch_data))if(NROW(batch_data) != BATCH_SIZE){ END_OF_EXPORT <-TRUEcat("End of Data Export Reached\n") } else { OFFSET <- OFFSET + BATCH_SIZE } } ALL_DATA <-bind_rows(ALL_DATA)cat("Data export complete:", NROW(ALL_DATA), "rows and", NCOL(ALL_DATA), "columns.")write_csv(ALL_DATA, "data/mp03/nyc_tree_locations.csv")}
🗺Mapping️️ NYC Trees
Now that the necessary data has been collected, a visualization will be made to display:
Density of trees in a district
Exact locations of trees
Health of each tree
The visualization will serve as a starting point at which area(s) should be addressed with the best possible reasons.
Creating graph
#Read in data from the files that were downloaded.boundaries <-st_read('./data/mp03/nycc_25c')tree_data <-read.csv('./data/mp03/nyc_tree_locations.csv', stringsAsFactors =FALSE) |>filter(!is.na(tpcondition), !is.na(geometry)) |>#Rename column to be easier to understand on interactive visualizationrename("Condition"= tpcondition)# Parse the "c(lon, lat)" stringtree_data_parsed <- tree_data |>mutate(coord_str =trimws(gsub("c\\(|\\)", "", geometry))) |># Remove "c(" and ")"separate_wider_delim(coord_str, delim =",", names =c("x", "y"), too_few ="align_start") |>mutate(x =as.numeric(x),y =as.numeric(y) )# Create sfc geometrytree_data$geometry <-st_as_sfc(paste0("POINT(", tree_data_parsed$x, " ", tree_data_parsed$y, ")"))# Convert to sftree_data <-st_as_sf(tree_data)st_crs(tree_data) <-4326#Joining the boundary and tree dataall_data <-st_transform(tree_data, st_crs(boundaries))all_data <-st_join(all_data, boundaries)all_data_small <- all_data |>slice_head(n=30000)#Used for later questions#Count trees per districttree_counts <- all_data |>group_by(CounDist) |>summarise(tree_count =n(), .groups ='drop')#Add findings to boundaries datasetboundaries <- boundaries |>st_join(tree_counts)#Store plot in variable to make it interactive in the next code blocktree_plot <-ggplot() +geom_sf(data = boundaries, mapping =aes(geometry = geometry, fill = tree_count)) +scale_fill_gradient(low ="#F0FFF0", high ="#084511", name ="Tree Count") +geom_sf(data = all_data_small, mapping =aes(geometry = geometry, color = Condition), alpha =0.5, size =0.3) +guides(color ="none") +scale_color_discrete() +labs(color ="Condition",title ="Street Trees in NYC by City Council District",subtitle ="Points represent the trees, shade shows tree density") +guides(color =guide_legend(override.aes =list(size =3))) +theme_bw()tree_plot
Show the code
#Make plot interactive using plotlyggplotly(tree_plot)
Notes on the Visualization
Note: The graph contains the first 30000 as points trees due to hardware limitations. The statements below only reflect this visualization and could change afterwards.
Within the 5 boroughs, Staten Island has the greatest density of trees yet most of these trees have an unknown or dead status. The Bronx has a large quantity of trees rated in excellent condition likely due to being far away from the JFK airport and being a starting point outside the metropolitan area. Manhattan also has many trees above the first bottom district, either representing an act was made to plant more trees or is simply used as decoration to attract tourists. This is an interactive graph, explore other areas to find different results!
🌲District-Level Analyses of Trees
With the tree points and district boundaries now connected to one data table, more analysis can be done besides looking at the visualization. For instance, it is must easier to determine which district had the most amount of trees instantly, not having to second guess our answer when doing this visually.
Note that all trees will be included in the following analyses.
Show the code
#Remove datasets that repeat tree data. Also remove redundant valuesrm(tree_data, tree_data_parsed, unzipped_pathname, url_path, DATA, boundaries, zip_name, zip_path, ALL_DATA)
Finding District with Most Trees
District with most trees
#Find the district with the most treestree_counts <- all_data |>group_by(CounDist) |>summarise(tree_count =n(), .groups ='drop') |>mutate(Borough =case_when( CounDist >=1& CounDist <=10~"Manhattan", CounDist >=11& CounDist <=18~"Bronx", CounDist >=19& CounDist <=32~"Queens", CounDist >=33& CounDist <=48~"Brooklyn", CounDist >=49& CounDist <=51~"Staten Island",TRUE~NA_character_ )) |>arrange(desc(tree_count))#Create a format_titles variable to make the table columns look nicer. Used in later chunks#Credit: Professor Michael Weylandtlibrary(stringr)format_titles <-function(df){colnames(df) <-str_replace_all(colnames(df), "_", " ") |>str_to_title() df}tree_counts |>st_drop_geometry() |>slice_head(n=10) |>select(CounDist, Borough, tree_count) |>format_titles() |>rename("Council District"= Coundist) |>datatable(style ="bootstrap5", caption ="Top 10 Districts With The Most Trees")
Findings
Council District 51 in Staten Island has the most trees with 70965 recorded. Oddly enough, Staten Island also ranks 2nd and 6th for having the most trees, possibly indicating it is tree dense with so many trees in one borough (Staten Island only has 3 districts).
Many Council Districts for Queens also appear, alluding that there is a good chance trees will be seen whichever neighborhood one enters.
District with Highest Tree Density
Show the code
#Use the Shape_Area column to act as the density maker per districtdensity_trees <- all_data |>st_drop_geometry() |>group_by(CounDist) |>summarise(Shape_Area =first(Shape_Area), # or sum()/mean() if appropriate.groups ="drop" ) |>left_join( tree_counts |>st_drop_geometry() |>select(CounDist, tree_count, Borough) |>distinct(CounDist, .keep_all =TRUE), # Remove duplicate CounDist rowsby ="CounDist" ) |>mutate(area_sqkm =as.numeric(Shape_Area) /1e6,tree_density = tree_count / area_sqkm ) |>arrange(desc(tree_density)) |>drop_na() |>select(CounDist, Borough, tree_count, area_sqkm, tree_density)density_trees |>format_titles() |>rename("Council District"= Coundist) |>rename("Area (sqkm)"="Area Sqkm") |>rename("Tree Density (sqkm)"="Tree Density") |>datatable(style ="bootstrap5", caption ="Top 10 Districts With Most Dense Trees") |>formatRound(c("Area (sqkm)", "Tree Density (sqkm)"), digits =3)
Findings
Council District 7 in Manhattan has the most dense trees with 283.549 per sqkm recorded. Despite having a near top tree count of 15,000, Council District 7 is the 4th smallest District in all of the NYC metropolitan area and managed to cram the most trees in one place doing so. Compared to the largest district 50 in Staten Island, it has a tree density of about 78 sqkm, likely due to the size of the district.
Manhattan is a borough that excels in density as it crams in whatever it can into the most popular borough worldwide, appearing 5 times in the top 10 list. Having this mindset could also be a reason districts in Manhattan did so well in this category.
District with Most Amount of Dead Trees
Show the code
#Calculating statistics for dead treesdead_trees <- all_data |>st_drop_geometry() |>filter(!is.na(Condition), !is.na(CounDist)) |>group_by(CounDist) |>summarize(total_trees =n(),total_dead_trees =sum(Condition =='Dead', na.rm =TRUE),fraction_dead_trees = total_dead_trees / total_trees *100,.groups ='keep') |>left_join( tree_counts |>st_drop_geometry() |>select(CounDist, Borough) |>distinct(CounDist, .keep_all =TRUE),by ="CounDist" ) |>select(CounDist, Borough, total_trees, total_dead_trees, fraction_dead_trees) |>arrange(desc(fraction_dead_trees))dead_trees |>rename("Council District"= CounDist) |>format_titles() |>rename("Fraction Dead Trees %"="Fraction Dead Trees") |>datatable(style ="bootstrap5", caption ="Dead Tree Data") |>formatRound("Fraction Dead Trees %", digits =3)
Findings
Council District 32 in Queens has the highest percent of dead trees compared to the rest of its trees with about 14.255% of trees being dead. A reason for this could be that Queens generally does not receive attention like Manhattan would; paired with being a very large borough leads to more required maintenance. District 32 does land in the top 10 of most amount of trees in the district, explaining there is a ton of work to fix those trees.
What’s interesting is that Brooklyn had no districts in this category, showcasing it either has fewer trees than Queens or is capable to maintain them more effectively.
The most common tree species in Manhattan is the Thornless honeylocust with 17310 appearances. This appears to be a very common tree across Manhattan as the next most common, the London planetree has about 6000 fewer appearances. Trees quickly go to 4 digits, then 3 digits for total appearance, suggesting the Thornless honeylocust may live longer, can adapt to the industrial standards of Manhattan, and actually thrive compared to other species. More statistics would be needed to verify such a claim.